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Abstract 



We consider the downlink of a multi-cell system with multi-antenna base stations and single- antenna 
user terminals, arbitrary base station cooperation clusters, distance-dependent propagation pathloss, and 
general "fairness" requirements. Base stations in the same cooperation cluster employ joint transmission 
with linear zero-forcing beamforming, subject to sum or per-base station power constraints. Inter-cluster 
interference is treated as noise at the user terminals. Analytic expressions for the system spectral efficiency 
are found in the large-system limit where both the numbers of users and antennas per base station tend 
to infinity with a given ratio. In particular, for the per-base station power constraint, we find new results 
in random matrix theory, yielding the squared Frobenius norm of submatrices of the Moore-Penrose 
pseudo-inverse for the structured non-i.i.d. channel matrix resulting from the cooperation cluster, user 
distribution, and path-loss coefficients. The analysis is extended to the case of non-ideal Channel State 
Information at the Transmitters (CSIT) obtained through explicit downlink channel training and uplink 
feedback. Specifically, our results illuminate the trade-off between the benefit of a larger number of 
cooperating antennas and the cost of estimating higher-dimensional channel vectors. Furthermore, our 
analysis leads to a new simplified downlink scheduling scheme that pre-selects the users according to 
probabilities obtained from the large-system results, depending on the desired fairness criterion. The 
proposed scheme performs close to the optimal (finite-dimensional) opportunistic user selection while 
requiring significantly less channel state feedback, since only a small fraction of pre-selected users must 
feed back their channel state information. 



Index Terms 



Large random matrices, linear zero-forcing beamforming, network MIMO, channel estimation, down- 
link scheduling. 
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I. Introduction 

The next generation of wireless communication systems (e.g., 802.16m |[T|, LTE-Advanced [l2l|) consider 
multiuser MIMO (MU-MIMO) as one of the core technologies. A considerable research effort has been 
dedicated to the performance evaluation of MU-MIMO systems in realistic cellular environments ||3|-||5|. 
In a single-cell setting with perfect Channel State Information at the Transmitter (CSIT), the system 
reduces to a vector Gaussian broadcast channel whose capacity region is completely characterized 



|T0| . However, in a multi-cell scenario, we are in the presence of a vector Gaussian broadcast and 
interference channel, which is not yet fully understood in an information theoretic sense. 

A simple and practical approach consists of treating Inter-Cell Interference (ICI) as noise. In this 
case, ICI may significantly limit the system capacity. A variety of inter-cell cooperation schemes have 
been proposed to mitigate ICI, ranging from a fully cooperative network MIMO ||TT|-[[T4| to partially 
coordinated beamforming |[T5|-|[T7|. In this work, we focus on the network MIMO approach with limited 
cooperation, where clusters of cooperating base stations (BSs) act as a single distributed MIMO transmitter 
and interference from other clusters is treated as noise. 

In a cellular environment, the received signal power is a polynomially decreasing function of the 
distance between transmitter and receiver, with a dynamic range typically larger than 30 dB |18|. Thus, 
users close to the cell (or cluster of coordinated cells) boundary experience strong inter-cell interference, 
whereas the desired signal is relatively weak. These "edge" users cannot be just ignored by the system. 
For example, maximizing the system sum-rate leads in general to a very unfair operating point, where 
the system resources are concentrated on users near the cell (or cluster) center. In contrast, fairness 
scheduling has been proposed and widely studied in order to achieve a desirable balance between 



sum-rate and fairness (see for example p9|-pT| and references therein). Fairness scheduling can be 
systematically implemented in the framework of stochastic network optimization |20[ . Such fairness 
scheduling algorithms dynamically allocate the system resources on a slot-by-slot basis, such that the 
long-term average (or "ergodic") user rate point maximizes some suitable concave and componentwise 



increasing network utility function p2[ . While the analytical characterization of the optimal ergodic rate 
point for a given network utility function may be hopelessly complicated in a realistic scenario, the system 
performance has been evaluated so far through computationally very intensive Monte Carlo simulation 
|[3|-|[5|, [ [T3| , [ [T4| , [[23|-[[26|, where the actual scheduling algorithm evolves in time and the ergodic rates 
are computed as time averages. 

The capacity of a multi-cell network MIMO system under fairness criteria was evaluated in the large- 
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system limit in |27|, assuming ideal channel state knowledge and the Gaussian Dirty Paper Coding 
(DPC) transmission strategy. In ||27|, the asymptotic analysis method based on large random matrix results 
demonstrated its effectiveness by the comparison with finite-dimensional Monte Carlo simulation. In this 
work, we apply a similar approach to Linear Zero-forcing Beamforming (LZFB). It turns out that the 
analysis in this case is significantly more complicated, in particular, in order to take into account per-BS 
power constraint. In this paper we also extend our analysis to the case where the CSIT is obtained through 
explicit training and MMSE estimation. In these conditions, we obtain a lower bound on the achievable 
ergodic rates (referred to as "throughput" in the following), that takes into account the overhead due 
to training-based channel estimation. Several novel and important aspects are illuminated in this paper. 



Specifically: 1) As in p7| , our analysis allows precise performance evaluation of systems for which 
brute-force Monte Carlo simulation would be very demanding. 2) By including the effect of training 
and channel estimation, we can investigate the tradeoff between ICI reduction owing to BS cooperation 
and the cost of estimating larger and larger dimensional channels. Unlike previous results that assumed 



ideal CSIT at no cost p4| , p5| , p7| , we observe that there exist an optimal cooperation cluster size 
that depends on the channel coherence time and bandwidth, beyond which cooperative network MIMO is 
not convenient, consistently with the finite-dimensional simulation findings of ]26| |, p8| . 3) We provide 
novel results in random matrix theory, in particular, related to the evaluation of the coefficients appearing 
in the per-BS power constraint. 4) We use our asymptotic analysis in order to design a probabilistic 
scheduling algorithm that randomly pre-selects the users with assigned probabilities obtained from the 
large-system results, and therefore requires much less CSIT feedback than the standard opportunistic 
scheduling scheme based on channel-driven user selection. 

This last point deserves some remarks, since for the first time (to the authors' knowledge) asymptotic 
results are used not only for performance analysis but also for system design in network MIMO. The 
standard approach to scheduling for downlink beamforming consists of having a large number of users 
feeding back their CSIT and selecting a subset of users with cardinality not larger than the number of 
jointly coordinated transmit antennas, such that the channel vectors of the selected users have both large 
norm and are mutually approximately orthogonal [ |3T| , p2| . This multiuser diversity selection, combined 
with LZFB precoding, is shown to attain the same performance as Gaussian DPC in the limit of a large 



^In the unrelated context of multiuser detection, asymptotic large- system results were used to design low-complexity linear 
multiuser detectors based on the polynomial approximations, where the polynomial coefficients were obtained from large random 
matrix theory |[29|, |[30[. 
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number of users and fixed number of transmit antennas. However, in this limit, the throughput per user 
vanishes as oC^^^) where n is the number of users. Therefore, a more meaningful regime is one 
in which the number of users is proportional to the number of antennas, yielding constant throughput 
per user. This is in fact the large-system regime investigated in this paper. Comparing the results of 
our asymptotic analysis with the Monte Carlo simulation of finite dimensional systems, including user 
selection as said before, we notice that multiuser diversity yields larger throughput per user for low- 
dimensional systems, but this gain reduces as the system dimension grows. This is a manifestation of the 



"channel hardening effect" noticed in p3| , and agrees with the theoretical findings in p4| , showing that 
the probability of finding a subset of approximately orthogonal users vanishes as the system dimension 
increases. Hence, as the system becomes large, there is diminishing return in selecting users from a large 
set. In contrast, the cost of CSIT feedback grows at least linearly with the number of users feeding back 
their estimated CSIT. Therefore, we advocate a probabilistic scheduling algorithm for which users are pre- 
selected at random using the probabilities derived from our large-system analysis, reflecting the desired 
fairness criterion, and only the selected users are required to feed back their CSIT. The performance of 
this scheme are shown to be close to the much more costly full user selection scheme, and become closer 
and closer as the system dimension increases (again, by the large-system limit and channel hardening 
effect). 

In comparison with concurrent existing literature, we notice that the LZFB MU-MIMO performance 
analysis with non-ideal CSIT was extensively studied in the finite-dimensional case (see for example 



p5]|-p7]|) and in the large-system limit (see for example p8|-[[40|). Unlike concurrent works, our paper 
focuses explicitly on the system optimization under the fairness criteria in the multi-cell downlink with 
inter-cell cooperation. This particular angle allows us to illuminate aspects that are not present in other 
works, such as the distribution of the per-user throughput under fairness and, as a consequence, the design 
of the random scheduling scheme said before. 

The remainder of this paper is organized as follows. In Section |ll| we describe the general finite- 
dimensional system model including the arbitrary clustering of cooperative BSs, formulate the system 
optimization problem, and provide its numerical solutions for a given channel realization. In Section 



III we take the large system limit and present the large-system regime of the LZFB precoder and 



the optimization algorithm for user selection and power allocation. The opportunistic fairness scheduling 
scheme is also described in this section. The impact of non-perfect CSIT and training overhead is analyzed 
in Section |IV| Numerical results and the low-complexity randomized scheduling scheme are presented 
in Section |V| and some concluding remarks are given in Section |Vl| The most lengthy and technical 
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derivations are relegated into the appendices. 

II. Finite Dimensional System 

A. System Setup 

Consider a cellular system formed by M BSs, with antennas each, and KN single-antenna user 
terminals spatially distributed in the coverage area. We assume that the users are divided into K co- 
located "user groups" with N users each. Users in the same group are statistically equivalent: they see 
the same pathloss coefficients from all BSs and their small-scale fading channel coefficients are i.i.d.. 
The received signal vector = [y^^i • • • yk,NV ^ ^he k-th user group is given by 

M 

Yfc = X! ^rn,fcH^,fcXm + n^, (1) 

171=1 

where am^k and denote the distance dependent pathloss coefficient and x N small-scale channel 
fading matrix from the m-th BS to the k-th user group, respectively, = [xm,i ' ' • Xm,jNV ^ 
the transmitted signal vector of the m-th BS, subject to the power constraint tr(Cov(x^)) < P^, and 
= • • •^/c,A^]^ ^ denotes the additive white Gaussian noise (AWGN) at the user receivers. 
The elements of n^^^ and of are i.i.d. CA/'(0, 1). 

A cooperative cell arrangement with L cooperation clusters is defined by the BS partition {A^i , • • • ^Ml} 
of the BS set {1, • • • , M} and the corresponding user group partition {/Ci, • • • , /Cl} of the user group 
set {1, • • • We assume that the BSs in each cluster Mi act as a single distributed multi-antenna 

transmitter with 7|A^^|A^ antennas, perfectly coordinated by a central cluster controller, and serve users 
in groups /c G /C^. The clusters do not cooperate and treat ICI from other clusters as noise. Assuming 
that each BS operates at its maximum individual transmit power, the ICI plus noise power at any user 
terminal in group A: G /C^ is given by 

Each cluster seeks to maximize its own objective function defined by the fairness scheduling. It is easy 
to show that, under the above system assumptions, the selfish optimal strategy that operates at maximum 
per-BS power is a Nash equilibrium of the system. At this Nash equilibrium, the clusters are effectively 
decoupled since the effect that other clusters have on each cluster i is captured by the ICI terms in ([2]) 
that do not depend on the actual BS transmit covariances Cov(x^). 

From the viewpoint of cluster £, the system is equivalent to a single-cell MIMO downlink channel 
with a modified channel matrix and noise levels and a per-BS power constraint. Therefore, from now on 
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we focus on a given reference cluster ^ = 1 and, without loss of generality, we indicate the user groups 
in the reference cluster as k = 1, . . . , A, with A = |/Ci|, and the BSs in Mi as m = 1, . . . , S with 
B = \M.i\. After a convenient re-normalization of the channel coefficients, we arrive at the equivalent 
channel model for the reference cluster given by 

y = H^x + z (3) 
with y G C^^, X G C^^^, z - CA/'(0, Ian) and the channel matrix H E C^^^^^^ is given by 

/3i,iHi^i • • • /3i^aH.i,a 



(4) 



/3b,iH^^i • • • /3b,aH.b,a 
where we define = <^m,k/(^k- The pathloss coefficients are fixed constant, that depend only on the 
geometry of the system, and the small-scale fading coefficients are assumed to evolve according to a 
block-fading process, that changes independently from slot to slot and remains constant over each slot. 
This is representative of a typical situation where the distance between BSs and users changes significantly 
over a time-scale of the order of the tens of seconds, while the small-scale fading decorrelates completely 
within a few milliseconds [ [4T] |. Here, a "slot" indicates a block of channel uses over which the small-scale 
coefficients can be considered constant. The slot length (in channel uses) is approximately equal to the 



product of the channel coherence time and the channel coherence bandwidth |41|. 



B. Downlink Scheduling Optimization Problem 

Each cluster controller operates according to a downlink scheduling scheme that allocates instanta- 
neously the transmission resource (signal dimensions and transmit power) to the users. Following [ [22| , 
the scheduling problem is formulated as the maximization of a suitable strictly increasing and concave 
network utility function g(-) over the region of achievable ergodic rates (throughput region), which is 
convex by time-sharing. For users in group k, we define the mean group throughput Rk as the the 
arithmetic mean of the individual user throughputs, i.e., Rk = -kYl!i=i^^k - the symmetry of the 
system in the users belonging to the same group, it turns out that for any achievable throughput point with 
given individual user throughputs {R\ }, there exists an achievable throughput point such that all users 
in group k have throughput Rk. In other words, the cumulative throughput of users in the same group can 
always be distributed uniformly over these users, without changing the sum throughput. We assume that 
the network utility function gives the same priority to statistically equivalent users. This is captured by 



6 



restricting g{-) to be Schur-concave p2|p]lt follows that the network utility function is always maximized 
at a point for which i?^^^ = Rj. for all users i in group k. Therefore, letting R = • • • , Ra) and 
redefining the function g{-) to have A arguments, the fairness scheduling problem is formulated directly 
in terms of the user mean group throughputs, as 

maximize ^(R) 

subject to Ke% (5) 

where TZ denotes the system A-dimensional achievable group throughput region. In particular, this work 
considers LZFB downlink precoding. Hence, TZ indicates the group throughput region achievable by 
LZFB for the channel model ([3]), under the assumption of operating at the Nash equilibrium said above. 
A scheduling policy achieving the optimum group throughput point R^ solution of ^ consists of a rule 
that, at each scheduling slot, maps the available channel information H into a set of scheduled users, 
rates and transmit powers, such that the resulting long-term time averaged group rates converge to R^. 

As a first step towards the solution of (|5]), we focus on the weighted instantaneous sum-rate maxi- 
mization problem: 

A N 

maximize ^ ^ VF^^^ i?^^^ 

k=l i=l 

subject to R G 7^lzfb(H) (6) 

where VF^^^ denotes the rate weight for user i in group fc, and 7^izfb(H) is the achievable instantaneous 
rate region of LZFB for given channel matrix H. By "instantaneous", we mean that this rate region 
depends on the given channel realization H, in contrast with the throughput region TZ, that depends on 
the statistics of H. Realistically, we assume that A > (i.e., the number of users in the cluster is 
larger than or equal to the total number of BS antennas in the cluster) and that all coefficients f3jri,k are 
strictly positive. Therefore, we have rank(H) = ^BN almost surely. In this case, LZFB cannot serve 
simultaneously all users in the cluster, and the scheduler must select a subset of users not larger than 
^BN, to be served at each slot. 

It should be noticed at this point that, for the sake of completeness, we consider the weighted 
instantaneous sum-rate maximization problem in the most general case where the weights VF^^^ 
are distinct for each individual user. As a matter of fact, because of the system symmetry said before, in 

li 

^The class of a-fairness network utility functions introduced in |22|, including max-min and proportional fairness, satisfy this 
condition. 



(i) 

the large system limit we will be interested in the solution for ^ = Wk (same weight, and therefore 
same priority, for all users i in the same group k). 

The solution of (|6]) is generally difficult, since it requires a search over all user subsets of cardinality 



less or equal to jBN. Well-known approaches (see pl| , p2| ) consider the selection of a user subset in 
some greedy fashion, by adding users to the active user set one by one, until the objective function in 
([6]) cannot be improved further. Moreover, even for a fixed set of active users, the problem of optimal 
LZFB precoding subject to a per-BS power constraint is non-trivial and has been recently addressed in 
p3|-[[45| through fairly involved numerical algorithms. Because of these difficulties, problem ([6]) has so 



far escaped a clean analytical solution and most studies resorted to extensive and costly Monte Carlo 
simulation. 

In order to overcome the above difficulties, we make the following simplifying assumptions: 1) The 
scheduler picks a fraction jj.]^ of users in group k by random selection inside the group. The selected users 
are referred to as the active users of group k. The active user set selection is statistically independent 
over different scheduling slots; 2) The LZFB precoder is obtained by normalizing the columns of the 
Moore-Penrose pseudo-inverse of the channel matrix, although this choice is not necessarily optimal 
under the per-BS power constraint ||43|. 

Under these assumptions, we let /x = (/^i, • • • , /^a) denote the fractions of active users in groups 
1, . . . , A, respectively. For given /x, the corresponding effective channel matrix is given by 

^i,iHi^i(/ii) ••• ^i^aHi^a(ma) 



(7) 



^B,lH^^l(/il) ••• l3B,AH.B,A{f^A) 

where the blocks Hj^^kif^k) is a x jj^j^N dimensional submatrix of lim,k- The user fractions must 

satisfy ^ [0, 1] for each /c = 1, . . . , A and = jj^i^a < where we introduce the notation 

k 

liv.k^^l^j' (8) 

Hence, we have rank(Hy[x) = almost surely. 

With LZFB precoding, the transmitted signal m given by 

x/x = (9) 

where u E C^^ contains the users' information-bearing code symbols, statistically independent with 
mean zero and variance 1, is the precoding matrix with unit-norm columns, and Q is a diagonal 
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matrix which contains the user transmit powers on the diagonal. The precoding matrix is obtained 
as follows. The Moore-Penrose (right) pseudo-inverse of ¥L^ is given by 

H+ = H^(hH H^)-i. (10) 

Then, we let = H^A^^ where the column-normalizing diagonal matrix Afj^ contains the reciprocal 
of the squared norm of the columns of on the diagonal. Letting A^j^\fi) denote the diagonal element 
of in position fii-k-iN + z, for i = 1, . . . , fikN, we have 



k 



(11) 



where 



denotes the element in the corresponding position fii-k-i^ + i of the main 



diagonal of the matrix (^^^^ . Rewriting (3) with (7) and (9) and noticing that ^ = ^/x^' 

we arrive at the "parallel" channel model for all active users in the form 

= A^/'qV2u + z^. (12) 

The optimization of for the channel model Jl2Hs still involved, since the channel coefficients A^^^(/x) 
depend on the active user fractions /x in a complicated and non-convex way. As an intermediate step, we 
consider the solution of ([6]) for fixed user fractions /x. 



C. Power Allocation under Sum-power or Per-BS Power Constraints 

We divide all channel matrix coefficients by \fN and multiply the BS input power constraints Pm by 
N, thus obtaining an equivalent system where the channel coefficients have variance that scales as 
This is useful when considering the large-system limit for ^ oc in Section [llH 

Let q]^ ^ denote the diagonal element in position iJLi-j^_iN^i of Q, corresponding to the power allocated 
to the z-th user of group k. The sum-power constraint is given by 



N 



(13) 



k=l i=l 



where Psum = Ylm=i -^^n- In order to express the per-BS power constraint, let denote a diagonal 
matrix with all zeros, but for consecutive "1", corresponding to positions from (m — 1)7A^ + 1 to 
7717 on the main diagonal. Then, the per-BS power constraint is expressed in terms of the partial trace 
of the transmitted signal covariance matrix as 

^tr (*^V;xQV^) <Pm, m = 1, . . . , B (14) 



More explicitly, |l4| ) can be written in terms of the powers qf^ as 

A iikN 

m — 1, . . . ,B 



k=l i=l 

where we define the coefficients 



^=(m-l)7A^+l 

and where [Vy^^,]^^^ denotes the element of \fj, corresponding to the ^-th row and the {iJ^i^k-iN + z)-th 
column. Since has unit-norm columns, then J2^=i ^^^ki^) = for all k, i. 

For fixed user fractions /x, the weighted instantaneous sum-rate maximization (|6]) reduces to 

A iikN 

maximize ^ ^ H^^('hog(l + A^'^(/x)g^'^) (17) 

k=l i=l 

subject to ( [T3] ) in the case of sum-power constraint, or to ([15]) for the case of per-BS power constraint. 



(15) 



(16) 



The solution of ( [T7] l subject to the sum-power constraint is immediately given by the water-filling 
formula 



(i) 



A 



(18) 



where A > is the Lagrange multiplier corresponding to the sum-power constraint. 

In the case of per-BS power constraint, we can use Lagrange duality and the sub-gradient iteration 



method as given in the following. The Lagrangian function for ( fTT] ) is given by 

A HkN 

£(q, X)^Y.Y. l°g(l + 4^V)9f ) - [©q - P] (19) 

k=l i=l 

where A > is a vector of dual variables corresponding to the S BS power constraints, is the B x iiN 
matrix containing the coefficients 0^j^{fi) and P = (Pi, . . . , Pb)~^- The KKT conditions are given by 



dql' 



W, 



' 1 + A«(^K' 



« - ^^^^^ < 



(20) 



where 0^*^ indicates the column of containing the coefficients ^^^(At) for m = 1, . . . , Solving for 
we find 



(21) 



Replacing this solution into £(q, A), we solve the dual problem by minimizing £(q(A), A) with respect 
to A > 0. It is immediate to check that for any A' > 0, 

/:(q(A'),A')>£(q(A),A') 

= (A' - A)"^(P - 0q(A)) + £(q(A), A) (22) 
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Therefore, (P — 0q(A)) is a subgradient for >C(q(A), i/). It follows that the dual problem can be solved 
by a simple S-dimensional subgradient iteration over the vector of dual variables A. 

III. Large System Limit 

In this section, we consider the limit of the above instantaneous rate maximization problems for 
N ^ oo and fixed 7,^4,5, and /x. We shall see in Section [TlI-DI that the weights in the weighted 



instantaneous sum-rate maximization are recursively calculated by the scheduler that solves the general 
network utility maximization problem For Schur-concave g(-) and in the large system limit, where 
the "instantaneous" channel gains A^^\fi) freeze to deterministic limits that depend only on the group 
index k and not on the individual user index i (see Theorem [T] below), these weights must be identical 
for all users in the same group. Since the weighted-sum rate maximization problem is used here as 
an intermediate step to devise the scheduling rule that solves ([5]), from now on we restrict to the case 

(i) 

^ for all users i in group k. 
A. Asymptotic Analysis 

We start by finding the large system limit expression for the coefficients A^^\fi). This is provided by: 
Theorem 1: For alH = 1, . . . , ^j^^^A^, the following limit holds almost surely: 

B 

Jirn^ A^'V) = Afc(/x) = 7 E ^kkVrnitJi) (23) 

where (77i(/x), . . . , i^Bifi)) is the unique solution in [0, 1]^ of the system of fixed-point equations 

^ n B'^ 

= 1 - > m = 1, . . . , (24) 

with respect to the variables r] = {r/m}- 

Proof: See Appendix |A| ■ 
As anticipated above, the limit ( [23] ) depends only on k (user group index) and not on i (user index 
within the group), consistently with the fact that, in our model, users in the same co-located group 
are statistically equivalent. An immediate consequence is that in the limit for ^ oo, for the case 



W^^ = Wk considered here, the waterfilling equation ( |18[ ) yields equal power allocation q)^^ = for 
all active users i in group k. 

Next, we consider the per-BS power constraint given in ( [15] ). By the system symmetry and for the 
sake of analytical tractability, also in this case we restrict to uniform power allocation q]^^ = qk for all 
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active users i in the same group k. Replacing this in the constraint ( [TS] ), we obtain 

A 
k=l 

where we define 



1=1 



^ IJikN 



i=l e=l+{m-l)'rN 



(25) 



(26) 



It is interesting to notice that 9m,k{tA the squared Frobenius norm (normaUzed by N) of the submatrix 
of \ n corresponding to the users in group k (columns from fii-.k-i^ + 1 to fii-.k^) and the antennas 
of BS m (rows from (m — 1)7A^ + 1 to mjN). 

(i) 

We hasten to point out that the choice q\ — qk does not follow immediately from the KKT conditions 



(20), even assuming Wk = W^^ and in the limit of A^^^(/x) J^k{t^)^ since the terms 0^j^{fi) may 
depend on i and not only on k even in the large system limit. As a matter of fact, Theorem [2] below 
yields the convergence of Om,k{l^) = J2i=i ^k\l^) ^ deterministic limit. Notice that each individual 
term 0^j^{fi) vanishes as ^ oo since, by construction, J2m=i ^mk(t^) ^ 1/A^. Based on numerical 
evidence, we conjecture that \N9^j^{fi) — O^^kit^)] ^ as A^ ^ oc, for all i in group k. However, 
proving the convergence of the individual random variables N6^j^{/j,) to the same deterministic limit 
independent of i has resisted our efforts. In conclusions, beyond the sake of analytical tractability and 

(i) 

system symmetry considerations, we also conjecture that the symmetric power allocation qj^^ = qk for 
users in the same group is also optimal for the case of per-BS power constraint, in the limit of A^ ^ oc. 
The next result yields an analytical expression for the large-system limits of the coefficients ^rn,/c(M)- 
Theorem 2: For all m, k, the following limit holds almost surely: 



lim 9m,k(f^) 



f^kVmit^) i^Lk+^rn^k 



where = (Cm,!, • • • , Cm,^)^ is the solution to the linear system 



[I-7M]|^ = 7Mb, 



where M is the A x A matrix 



B 



.i=i 



diag 



(27) 



(28) 



(29) 



and h£ = (/^^i, • • • i/^iaV^ coefficients {r]m{f^)} and {Ak{fJ^)} are provided by Theoremjl 

Proof: See Appendix |Bj 
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Fig. 1. Linear one-sided 2-cell model with B — 2 BSs and A = 8 user groups. 



1 ) Simplifications for Symmetric System: Under some special symmetry conditions, the general prob- 
lem can be significantly simplified. In particular, we assume that B divides A, let A' = A/B, and that 
the S X A matrix of channel gains /3 = ^] can be partitioned into A' circulant submatrices of size 
B X B, i.e., such that each submatrix has the property that all rows are cyclic shifts of the first row and 
all columns are cyclic shifts of the first column. We shall refer to these submatrices as "circulant blocks." 
Also, we define users' equivalence classes as the sets of user groups whose corresponding columns of 
/3 form circulant blocks. Then, the set of A user groups is partitioned into A' equivalence classes. We 
re-index the user groups such that groups {{j — 1)A' + z : j = 1, . . . , 5} form the z-th equivalence class, 
for z = 1, . . . , To fix ideas, consider Fig. [T] showing a one-dimensional cellular system comprising 2 
BSs and and 8 symmetrically located user groups. Since the channel gain coefficients depend on distance, 
because of the symmetric layout, the matrix /3 is given by 

abccfeed 
feedabcc 

for some positive numbers a, 6, c, d, e, /. We notice that this matrix can be decomposed into the A' = 4 
circulant blocks 

a f 



(30) 



b e 
e b 



c e 
e c 



c d 
d c 



When this symmetry condition holds, users in the same equivalence class are statistically equivalent, 
up to renumbering of the BSs. This is because such groups (e.g., user group pairs (1, 5), (2, 6), (3, 7), 
and (4, 8) in the example) are equivalent as far as the "landscape" of channel gain coefficients seen 
collectively by the cluster BSs. 
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Motivated again by the Schur-concavity of the network utility function and by the symmetry of all 
users in the same equivalent class, for the sake of finding an analytically tractable solution and an overall 
problem simplification, we shall restrict to the resource allocation that gives to all groups k in the same 
equivalence class i the same transmit power and active user fraction (symmetric solution). We indicate 
these common powers and active user fractions by q[ and respectively, for z = 1, . . . , such that 
q(^j-i)A'+i = Qi ^iid = /x^ for all j = 1, . . . , B. In this case, for any m, we have 

A n2 ^ A' B [d2 A' 

q=l l2^i=lPi,q ' i=l j=l 2^i=lPi,{j-l)A'+i ' i=l ' 

where we used the fact that, by the symmetry condition, Yl^=i v-b'^'LV^^^'^' — = 1 and Ylt=i f^i ^ 
h ^q=i f^q ^ B' follows that the solution of the fixed point equation (24) is given explicitly by 

vM = 1 - (31) 

which is independent of m, and ( [23] ) yields 

A.(^) ^kk- (32) 

Notice that for all groups k in class z, i.e., for all k = {j — 1)A^ + Vj = 1, . . . , the sum J2m=i k 
is a constant independent of k. Therefore, as expected by the symmetry condition, all active users in 
the same equivalence class have the same LZFB channel gains. With some abuse of notation, we define 
I3f = Xlm^i /^m k groups k in the z-th equivalence class. In addition, we have: 

Lemma 1: For symmetric systems with fi^ = l^i for all k in equivalence class z, we have 



,/c(m) = ^m,k®AA'j{t^), (33) 



where we define the cyclic index shifts m ®b 3 = ((^ + j — 1) i^od B) + 1 and k ®a A'j = 
{{k + A'j — 1) mod A) + 1, and furthermore, if qk = q'l for all k in equivalence class z, the transmit 
power of BS m is independent of m, i.e., 

A A' 
k=l i=l 

Proof: See Appendix |C] ■ 
As an immediate corollary, we have that if all the BSs in the cluster have the equal power constraint. 



i.e.. Pi = . . . = = P, then the per-BS power constraint ( [25| ) coincides with the sum power constraint 
with Psum = BP. 



14 



TABLE I 

Asymptotic values of Om,k{fJ') in the symmetric case under the settings given in Example[T] 



k 

m ^^^^ 


1 


2 


3 


4 


5 


6 


7 


8 


1 


0.325 


0.311 


0.454 


0.565 


0.175 


0.189 


0.296 


0.435 


2 


0.175 


0.189 


0.296 


0.435 


0.325 


0.311 


0.454 


0.565 



Example 1: Consider the two cell symmetric model in Fig. [T] and assume that the two BSs are 



cooperating (B=2) and serving A=8 user groups, the channel gain coefficients are given as ( |30| ) with 
[a b c d e f] = [1.5 1.3 1.0 0.5 0.3 0.2], and the antenna ratio is given as 7 = 4. If /x = 
[0.5 0.5 0.75 1 0.5 0.5 0.75 1], the asymptotic values of 6>rn,/c(M) is given as in Table [l| We notice 
that the same block-circulant form of the gain matrix f3 appears in the matrix {6>rn,ife(M)}. as given by 
Lemma [T] ■ 

B. Weighted Sum-rate Maximization 

Using the asymptotic results obtained before, we consider the weighted sum-rate maximization problem 



in (17). First, we focus on the sum-power constraint (13). As noticed before, in the large system limit 



(i) 

and in the case of uniform weights ^ = Wk over each group k, the weighted sum-rate maximization 
solution yields that all the active users in the same group are allocated the same power and therefore 



achieve the same instantaneous rate. In these conditions, from ( [23] ) we have that the instantaneous mean 
group rate converges to jj Yli=i ^ l^kRk with 

= log(l + Afc(/x)gfc) (35) 

Notice that in the large-system limit is a deterministic quantity, therefore, the mean group throughput 
in this regime is also given by Rk = fJ^kRk- 



Using (24), we can write the large-system limit weighted sum-rate maximization problem subject to 
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the sum-power constraint in the form: 



maximize ^ Wki^k log 1 + 7 J] Pm.k^m j qk 1 (36a) 

k=l \ \m=l J J 

A A 

subject to ^ iikqk < ^sum, '^l^k< iB, (36b) 

k=l k=l 

= 1 - > M/c^=rB m=l,...,B (36c) 

0<r/m<l, m = l,...,S (36d) 

git>0, 0<Mit<l, = (36e) 

This problem is generally non-convex in q, /x and ry. However, for fixed 77 and /x, it is convex in q, and 
the solution is given by water- filling (see also ([TS])): 

Wk 1 



qk = 



(37) 



^ 7 E^=i/3 

For fixed 77 and q, we have a linear program with respect to /x. Finally, for fixed /x and q the problem 
is degenerate with respect to rj since the equality constraint ( |36c| ), that corresponds to the fixed-point 
equation (24), has a unique solution rj G [0, 1]^ for all feasible /x. 

In the symmetric system case with the conditions given in the previous section, we have that user 
groups in the same equivalence class are completely symmetric, since the limits A]^{/j,) depend only on 
the equivalence class and not on the specific user group in the class. Then, the optimization problem in 
the symmetric case reduces to optimizing the powers q[ and the fractions fi[ for the equivalence classes 
i = 1, . . . , A^ Letting /i' = J2t=i l^i = l^/B, we can state the optimization problem in the symmetric 



case as: 



maximize B ^ Wi^[ log + ^ ~ ~) ^^^^^') ^^^^^ 

A' 

bjectto S^/i^g^<Psum, (38b) 



su ^ 

i=l 
A' 



Y,^[<1^ (38c) 



1=1 



q[>0, 0</x^<l, z = (38d) 

The net effect of the symmetry is a sort of "resource pooling": the system with a cluster of B cooperating 
BSs reduces to an equivalent single-BS system with total transmit power Psnm/B, load jj.' = fJ^/B, 
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= A/B user classes, and equivalent channel path gains = Y^m=i k gi^^i^ by the combination 
of the path gains from all BSs in the cluster to the user groups in the i-th equivalence class. 

As far the per-BS power constraint is concerned, the power constraint in problem ([36]) must be replaced 
by (25) where the coefficients {6>rn,fc(M)} are provided by Theorem [2] Finally, in the symmetric case. 



using Lemma [T] and assuming = P for all m, the per-BS power constraint reduces to 

A' 

Y,i^\<P^ for all m = l,...,B. 

2=1 

Notice that the above set of constraints is identical for all BSs, and by summing over m we obtain the 
equivalent constraint B Ylt=i ^[l^'i ^ ^ ^sum, which coincides with the sum power constraint in 
([38]), as anticipated before. 



C. Optimization of the User Fractions and Powers 



While ( |36| ) is still a non-convex problem in (q, /x), we can find near-optimal solutions by borrowing 
from the greedy user selection heuristic used in the finite-dimensional case (see pT| , [|32|). In particular, 
we consider the approach of incrementing user fractions ji sequentially in very small steps, A/x <^ 1, 
until the objective function value cannot be increased any longer. If we take the infinitesimal of A/i, 
this is equivalent to greedy user selection in the large system limit where A/x denoting the fraction of 
one user to the total number of users goes to zero. We start from /x = and at each step we find k 
such that incrementing fi^ by A/x yields the largest improvement and the resulting new /x is feasible. 
For the tentative configuration of the fractions /x, the corresponding power allocation is obtained from 
the waterfilling solution. We stop when no further increment can improve the objective function value. 
The detailed description is given in the following: 

1) Initialize variables such that n = 0, i?wsr(0) = 0, /x = 0, and = 0. 

2) Set n ^ n + 1. For A/i <C 1, set /x^^^ = /x + /S^fiek (note: e^^^^ denotes a vector of length A of 
all zeros with a single 1 in position k), for A: G 5 = {j : iij + A/x < l,Vj}. If S is empty or 
/i + A/i > 7, then exit and keep the current /x and the corresponding rates as the final values of the 
algorithm. Otherwise, compute the tentative weighted sum rate value i?wsr for ^^ch k, by solving 



the optimization problem in (36) for fixed /x^^^ with the waterfilling power allocation. 



3) Let k = argmaxj[^^5 R^l^ and set R^Q^{n) = i?wsl. 

4) If R^sr{n) > Rwsr{n — 1), then set /x ^ fi^^\ /i ^ /i + A/i and go back to step 2. 

5) Otherwise, if i?wsr(^) < ^wsr(^ ~ 1)' ^^it and take the current /x and the corresponding rates as 
the final values of the algorithm. 
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Fig. 2. Cluster (B = 2) sum rate of the proposed optimization algorithm as /a' increases from to 7 = 4. 



Fig. [2] shows the sum rate versus 11' ^ 11/ B for the symmetric system of Example [T] with P = 15 
dB and VF^ = 1, Vi The optimization with respect to ^i' and is obtained by applying the algorithm 
described above, and the value of the objective function (sum rate) is compared with the globally optimal 
value obtained from exhaustive search. The exhaustive search finds the optimal weighted sum rate by 
searching over ^i' G [0, 1]^', subject to Ylt=il^i ^ 7- If we discretize this domain with A/x step size 
for each dimension, the computational complexity of the exhaustive algorithm is 0((l/A/x)^'), whereas 
the proposed greedy algorithm has complexity 0(A'7/A/x). In order to obtain the curve in Fig. [5] we 
removed the comparison between i?wsr(^) and i?wsr(^ — 1) in step 4, so that the algorithm does not 
stop when it cannot improve the objective function value any longer, and returns a value of the objective 
function over the whole range 11' G [0,7]. When A/x = 0.01, the greedy algorithm achieves the same 
optimal value of the exhaustive search, at fi' = 2.76. 



D. Network Utility Function Maximization 

In general, the solution of (|6]) (or ([36]) in the large system limit) for the case A > (more users than 
antennas) yields an unbalanced distribution of instantaneous rates, where some user classes are not served 
at all (we have /i/c = for some k). Hence, for a general strictly concave network utility function g(-), 
the ergodic rate region 71 requires time-sharing even in the asymptotic large-system case. Therefore, the 



18 



network utility maximization problem ([5]) is not generally amenable to a closed-form solution. However, 
the solution can be computed to any level of accuracy by using a method inspired by the stochastic 
optimization approach of [ |20| . Interestingly, the same algorithm can be used both for the computation of 
the optimum throughput point in the large system limit, and for the actual downlink scheduling algorithm, 
applied on a slot-by-slot basis to the actual finite-dimensional system. In the former case, the algorithm 
is equivalent to Lagrangian iteration where the "virtual queues" (to be defined in the following) plays 
the role of Lagrange multipliers. In the latter case, when applied to the finite dimensional system, the 
algorithm performs a stochastic "Lyapunov drift" optimization (see pO|). 

For each user group A: = 1, . . . , A, we define a virtual queue that evolves according to 

Qkit + 1) = [Qkit) - rkit)]^ + akit) (39) 

where rk{t) denotes the virtual service rate and ak{t) the virtual arrival process. The queues are initialized 
by Qfc(O) = ^/c(0) = 0. At each algorithm iteration t = 1, 2, . . ., the virtual arrival processes is given by 
ak{t) = a^, where a"*" is the solution of the convex programming problem: 

A 

maximize V^^(a) — ^ Qk{t)ak 
k=i 

subject to < a/e < amax, V k (40) 

and where V, a^iax > are suitably chosen constant parameters that determine the convergence properties 
of the algorithm. The service rates are given by 

rk{t) = i^kit) log + 7 1^5] Pl,kVm{t)^ qkit)^ 



where (/x(t), q(t), //(t)) is the solution of (36) for weights Wk = Qk{t)' Then, the virtual queues are 
updated according to ( [39] ). The theory developed in |20| (see also |21|) ensures the following result. Let 
r(t) denote the vector of service rates generated by the above iterative algorithm. Then, 



t-i 



limmf^ (^^ E^(^)j ^ ^(^') - ^ (41) 
where is the solution of (pi) and /C is a constant that depends on the system parameters and on amax- 



In particular, using the results in pT| we can show the bound 

^ < ^ (^max + log^ (1 + 7niax{/3^ : V m, fcjPsum)) 

By choosing V and a^iax appropriately, a desired tradeoff between the accuracy of the approximation of 
the optimum point and the convergence speed of the iterative algorithm can be ensured. 
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It should be noticed that if we use the greedy optimization of the user fractions as described in 
Section |III-C| instead of the exact solution of ([38]), then the performance guarantee ( [4T] ) is no longer 



valid. However, the algorithm ensures that the throughput point that maximizes g{-) over the ergodic 
rate region achievable with the (suboptimal) greedy optimization of the user fractions can be approached 
arbitrarily closely. 

IV. Channel Estimation and Non-Perfect CSIT 

So far, we have assumed that the transmitter (cluster controller) has perfect CSIT. In this section we 
consider the typical operation of a Frequency Division Duplexing (FDD) system, where where the BSs 
in each cluster acquire their CSIT by broadcasting a set of orthogonal downlink pilot signals (one signal 
per jointly coordinated antenna), in order to enable the users to measure their downlink channels and 



feed back the estimated channel state information in some form p7| . The noisy CSIT obtained by the 
training and feedback procedure is used at the cluster controller to compute the LZFB precoding matrix, 
to select the active users, and allocate power and rate. We seek the characterization of the non-trivial 
tradeoff between the advantage of having a large number of jointly processed transmit antennas (large 
7 and/or large B) and the overhead required for estimating the channels. We assume that the channels 
are constant over time-frequency blocks of size WT complex dimensions, where W denotes to the 
system coherence bandwidth (in Hz) and T denotes the system coherence time (in sec). These blocks 
are identified with the scheduling slots of our downlink system. For slot, ^pBN dimensions are dedicated 
to downlink training, in order to allow all users in the cluster to estimate the composite channel (i.e., the 
corresponding column of H in (|4])) formed by ^BN coefficients. Since the channel vectors are Gaussian, 
linear MMSE estimation is optimal with respect to the MSB criterion. A simple dimensionality argument 
shows that the MSB can be made arbitrarily small as cr^ ^ (vanishing noise plus ICI) if and only 
if Ip > 7- The ratio 7^/7 > 1 denotes the "pilot dimensionality overhead", relative to the minimum 
number of pilots for which the MMSB vanishes as 0. 

Focusing on the estimation of a generic column of H in (|4]) corresponding to some user j in group k 
of the reference cluster, the channel model of downlink channel estimation based on the common pilots 
is given by 

y^^'^ = Th^^') + 7.^ (42) 

where T is a jpBN x jBN training matrix with equal-energy orthogonal columns, corresponding to 
the training sequences sent in parallel from the ^BN cluster antennas (notice that the vertical dimension 
corresponds to channel uses, and the horizontal dimension corresponds to the antennas), the vector 
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(i) 

h^^ is the corresponding channel vector, obtained by stacking the channel vectors (including their path 
coefficients) corresponding to the different BSs in the cluster, and z^^^ is a vector of i.i.d. CA/'(0, 1) noise 
plus interference samples. As before, we index the BSs forming the reference cluster as m = 1, . . . , B 
and the user groups as /c = 1, . . . , A. With this notation, from ^ we have 



(43) 



where h^--^ denotes the j-th column of the block Hj fc, with i.i.d. CAf{0, 1) elements. 

The equal-energy and orthogonality condition on the columns of T yield that the total transmit power 
(energy per channel use) in the training phase is given by 



1 



-tr ( T^T 



7 



where we let t'^T = pi, and p denotes the energy of the training sequences. Letting the total training 
power equal to the total cluster transmit power, we obtain 

B 



p 



7 



Noticing that 



Cov(hif')) 



diag(/32 I) 



has block-diagonal structure with diagonal blocks given by scaled x identity matrices, we 
immediately obtain the MMSE estimator of h^^ in the form 



1 T-H_(j) 



with estimation error covariance given by 

Sfc = Dfc - pY>k (I + pDfc)-^ Dfc = Dfc (I + pDfc)-^ 



(45) 



(46) 



The MMSE covariance matrix is also block diagonal, with scaled identities diagonal blocks, and it depends 
only on the user group index k and not on the individual user in the group (this is expected, since the 
users in the same group are statistically equivalent). 

From the well-known orthogonality condition of MMSE estimation and from joint Gaussianity, we 
have the canonical decomposition 



(47) 
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where the estimator h[^^ and the error e^^^ are independent, and such that 

Cov(h['^) = Dfc - = pBk (I + P^kV^ Dfc 
Eventually, we can write the channel matrix H in Q in the form H = H + E, where 

/3i,iHi^i • • • ^i^aH.i,a 



(48) 



H 



/3b,aH.b,a 



(49) 



with 



A 



m.k 



m,k 



(50) 



and the blocks tim,k are independent with i.i.d. CAA(0, 1) elements, and where E is independent of H, 
and is given in the form 



E 



/3i,iEi^i 



with 



f3m,k — y f^rn,k ^m,k ~ 



^b,aEib,a 

Pm.k 



(51) 



(52) 



m.k 



and the blocks E^ are independent with i.i.d. CA/'(0, 1) elements. 

In a practical FDD system, the users should feed back their estimated channel on each time-frequency 
block, i.e., for each new observation. Several schemes have been proposed for closed-loop CSIT feed- 
back, including codebook-based vector quantization, scalar quantization of the channel coefficients, and 
unquantized "analog" feedback. CSIT feedback takes place on the uplink, and can be performed by 
accessing the uplink channel in FDMA/TDMA, or exploiting the MIMO-MAC nature of the uplink in 
order to allow a number of users proportional to the number of receiving antennas to send their feedback 
signals simultaneously (see p7|, [[46|, ||47|| and references therein). Analyzing the system in the presence 



of a specific feedback scheme is possible p7| . However, from the results in the above mentioned papers 
we know that a well-designed digital feedback scheme can achieve a quantization error that is negligible 
with respect to the downlink training estimation error. Furthermore, this can be done with a moderate use 
of the uplink feedback total capacity, provided that the number of users feeding back their CSIT is not 
too large (see for example the optimization tradeoff in [|48|). For the sake of simplicity, here we assume 
an ideal genie-aided CSIT feedback that provides H directly to the centralized cluster controller at no 
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additional costs, either in terms of rate or in terms of CSIT distortion. This provides a "best case" for 
any scheme based on explicit downlink training and CSIT feedback. In the next section, we propose a 
randomized scheduling scheme that pre-selects at each slot a subset of users, such that only these selected 
users need to feed back their CSIT, thus limiting the cost of CSIT feedback on the uplink capacity. 

As said before, the cluster controller computes a mismatched LZFB precoding matrix from the estimated 
channel matrix H instead of H. The following theorem yields an achievability lower bound on the large- 
system performance of the mismatched LZFB: 

Theorem 3: Under the downlink training scheme described above and assuming genie-aided CSIT 
feedback, the achievable rate of users in group k is lower bounded by 

1 + J2m=l ^m,k-^^ 



> log I 1 + ^B ^Z ^ I . (53) 



where 

B 

171=1 



Ait(M) =7 >"^,fc^m(M), (54) 



where (77i(/x), . . . , rjuilj)) is the unique solution with components in [0, 1] of the fixed point equation 



q=l' ^7Ef=lVe^lq 



r?^ = 1 - > /ig— — g m = l,...,B, (55) 



with respect to the variables r] = {r/m}- 

Proof: See Appendix |Dj ■ 
It is immediate to see that all the derivations and the optimization made before for the case of perfect 
CSIT, including the system symmetry conditions, can be applied straightforwardly to the case of non- 



ideal CSIT, provided that the per-user rates are replaced by the corresponding terms in (53 ). In particular. 



Theorem |2| is valid by replacing with given in (50). 

Finally, the system spectral efficiency must be scaled by the factor 1 — -^j^^ , that takes into 
account the downlink training overhead, i.e., fraction of dimensions per block dedicated to training. 
In particular, letting r = denote the ratio between the number of users per group, N, and the 
dimensions in a time-frequency slot, we can investigate the system spectral efficiency for fixed r, in the 
limit of ^ oc. The ratio r captures the "dimensional crowding" of the system. It is clear that a highly 
underspread system (WT » 1) can accommodate more users and more jointly coordinated antennas at 
the transmitter. Vice versa, if WT is not much larger than A^, then the number of jointly coordinated 
transmit antennas (captured by the product jB) is intrinsically limited by the channel time-frequency 
coherence. 
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V. Numerical Results and Probabilistic Scheduling 

In this section, first we provide a comparison between the large-system limit analytical results and 
the Monte Carlo simulation of the corresponding finite-dimensional systems with greedy user selection. 
Then, driven by the behavior of the finite-dimensional system, we propose a simplified probabilistic 
scheduling algorithm that randomly pre-select users according to the probability obtained from the 
asymptotic analysis. While greedy user selection needs that a large number of users (much larger than the 
scheduled active users) feed back their CSIT, in the proposed scheme the CSIT feedback is restricted just 
to the users that are effectively served. While a precise quantification the feedback resource requirements 
is out of the scope of this paper, we argue that the proposed scheme yields significant saving in the uplink 
feedback capacity, while achieving good throughput and fairness performance, when the system dimension 
is large. Finally, we consider the impact of non-perfect CSIT on the system performance and investigate 
the tradeoff between increasing the number of jointly coordinated antennas and the dimensionality cost 
incurrent by downlink training for channel estimation. 

We consider a linear cellular arrangement where M BSs are equally spaced on the segment [— M, M] 
km, in positions 2m — M — 1 for m = 1, . . . , M, and K user groups are also equally spaced on the same 
segment, with K/M user groups uniformly spaced in each cell. The distance dm^k between BS m and 
user group k is defined modulo [— M, M], i.e., we assume a wrap-around topology in order to eliminate 
boundary effects. We use a distance-dependent pathloss model given by o:^^ = Go/(l + {dm,k/^Y)) 
where the parameters Go, i^, and S follow the mobile WiMAX system evaluation specifications fTSl, such 
that the 3dB break point is 5 = 36m (i.e., 3.6% of 1 km cell radius), the pathloss exponent is z/ = 3.504, 
the reference pathloss at d^^k = 5 is Go = —91.64 dB, and the per-BS transmit power normalized by 
the noise power at user terminals is P = 154 dB. 

1 ) Comparison with finite-dimensional systems: Fig. [3] shows the average user throughputs (bit/s/Hz) 
versus user locations for the first two cells near the origin (given the symmetry, this pattern repeats 
periodically), for the case of M = 8 cells, K = 64 user groups, cluster size 5 = 1,2 and 8 and 7 = 4. 
Notice that with 8 user groups per cell and 7 = 4, we have twice as much users as antennas in each cell. 
The case 5 = 8 corresponds to the network- wide full cooperation. For the finite-dimensional Monte Carlo 



simulation, we applied the same stochastic optimization algorithm described in Section |III-D[ where now 
t denotes the scheduling slot index, and for each t a new set of i.i.d. channel vectors is generated. In this 
case, the instantaneous weighted sum-rate is obtained via the user selection algorithm of 131], assuming 
that the CSIT for all users in the systems is available at the cluster controllers. As far as the network 
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utility function g{-) is concerned, we consider the Proportional Fairness (PF) criterion, corresponding to 
^(R) = J2k ^^^^k' This PF criterion is applied to all the numerical results in this section. 

From Fig. |3(a)[ we notice that the advantage of full cooperation is significant, whereas the cluster of 
size B — 2 yields a significant improvement for the users in the center of the cluster, with respect to 



the basic cellular system with no cluster cooperation {B = 1). In Fig. |3(b)[ we compare the asymptotic 
results with the finite-dimensional simulation results in the case of S = 2. The finite dimensional system 
yields better per-user throughput than the large-system limit, thanks to the ability of the user selection 
to exploit the randomness in the instantaneous channel realizations (multiuser diversity). However, as 
the number of users at each location, N grows, the multiuser diversity gain continues to decline. For 
example, the relative gain of the finite-dimensional rate to the asymptotic rate is about 55% for = 1 
but only 25% for = 8. In the case of S = 1 and 5 = 8 which are not shown here, the same trends are 
observed even though the diversity gain is slightly larger (S = 1) or smaller (5 = 8). It is well-known 
that for large systems (large N), this multiuser diversity effect disappears because of "channel hardening" 

2) Random user preselection scheme for reduced CSIT feedback: User selection requires a large 
amount of CSIT feedback since it needs CSIT from many users in order to select a good subset at 
each scheduling slot, even though no more users than the number of antennas can be served at a time. 
For systems with finite but large size, it is not wise to have many more users than transmit antennas to 
feedback their CSIT, since the multiuser diversity effect becomes marginal whereas the feedback resource 
grows at least linearly with the number of users feeding back their CSIT at each slot. In this regime, a 
meaningful option consists of pre-selecting the users to be served in each slot, such that only these users 
feed back their CSIT. In this case, we have to design a user pre-selection scheme that approximately 
maximizes the desired network utility function. For example, a simple round-robin scheme may perform 
far from the desired fairness optimal point. 

For this purpose, we consider a probabilistic scheduling scheme based on our asymptotic analysis that 
effectively provides such user pre-selection. In the proposed scheme, the users to which CSIT feedback 
is requested are randomly selected in each slot t as follows: let {fik} be the user fractions per group of 
(approximately) co-located users, which is obtained from the asymptotic analysis. The cluster controller 
has a maximum of ^BN independent data streams to transmit using LZFB (equal to the number of 
jointly coordinated transmit antennas). At each slot t, the scheduler generates ^BN i.i.d. random variables 
S'i(t), . . . , S^BN{t), taking values on the integers {0, 1, . . . , A} with probability F{Si{t) = fc) = ^ for 
k and F{Si{t) = 0) = 1 — J2f=i Then, user group k is served by stream i at slot t if Si{t) = k. 
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(b) Finite dimension simulation for B — 2 



Fig. 3. User rate under perfect CSIT, obtained from asymptotic analysis for cooperation clusters of size B =\ (no cooperation), 
2, and 8 (full cooperation) and from finite dimension simulation with greedy user selection for B — 2 and N —\, 2, A, and 8. 
M = 8 cells and K = 64 user groups. 
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Fig. 4. User rate under perfect CSIT in finite dimension {N = 2, 4, and 8) with random user selection and power allocation 
aided by the asymptotic results for cooperation clusters of size B=\, 2, and 8. M = 8 cells and K = 64 user groups. 

Notice that streams z's for which Si{t) = are not used and that multiple streams may be associated to 
the same user group. Finally, for each stream a user in the associated group is selected at random, making 
sure that streams serve distinct users. Once the allocation of streams to users is determined, the selected 
users are requested to feedback their CSIT and the scheduler optimizes the transmit powers by solving 
the weighted sum rate maximization problem with weights Wk = dg{'R)/dRk, corresponding to the 
optimal asymptotic throughput point. In the special case of PF scheduling, this is given by Wk = ^/Rk^ 

The finite-dimension simulation results under this probabilistic user pre-selection scheme is compared 
with the asymptotic results in Fig. |4] under the same system setting as in Fig.|4] As N increases, the finite- 
dimensional results converge to the infinite-dimensional limit and they are almost overlapped, especially 
when S = 1 or 2. Hence, the proposed scheme is effective for systems of finite but moderately large 
size. 

3) Non-perfect CSIT and coordination vs. estimation tradeoff: Fig. [5] shows the cell sum rate (cluster 
sum rate normalized by the number of cooperating cells in the reference cluster) versus values of 7 in 
the cases of (a) perfect CSIT and no consideration of training overhead, and (b) non-perfect CSIT and 
explicit downlink training with 7^ = 7. We consider a larger number of user groups, K = 192 in the 
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M = 8 cells. As shown in Fig. |5(a)[ under the assumption of perfect CSIT given at no cost, the cell 
sum rate grows almost linearly as 7 (the ratio of BS antennas over the users per group) increases, and 
grows also as B (cluster size) increases, which shows the inter-cell cooperation and larger number of 
transmit antenna gain. However, when the CSIT estimation error and downlink training overhead are 
taken into account, there is a non-trivial tradeoff between the improvement owing to more and more 
jointly coordinated transmit antennas and the cost of estimating higher and higher dimensional channels. 

Notice that this tradeoff is "fundamental", in the following sense: a trivial upper bound on the achievable 
sum capacity of the reference cluster is obtained by letting all users perfectly cooperate as a single 
multi-antenna receiver. The capacity of the resulting block-fading single-user MIMO channel with jBN 
transmit antennas and AN receiving antennas and fading coherence block WT = N/r was characterized 



in the high-SNR regime in |49|, |50|. Using this result, in the case ^ > A > 7S, the dimensionality 
"pre-log" loss factor with respect to the case of ideal CSIT is given by ^1 — that coincides 

with what is said at the end of Section |IV| with choice of 7^ = 7. In fact, the "pre-log" optimality 
of explicit training for single-user MIMO channels with block fading in the high-SNR regime is well- 
known [ [50| , Also, the same result shows that if ^ < min{A,jB}, then there is no point in using 
more than WT/2 jointly coordinated antennas. Finally, notice that the recently proposed schemes for 
"blind" interference alignment [ [52| , exploiting reconfigurable antennas at the user terminals, still require 
channel state information at the receiver (CSIR) for coherent detection at each user terminal. Since the 
resulting channel is MIMO point-to-point, the same downlink training said above appears. In other words, 
these "blind" interference alignment schemes avoid CSIT feedback, but still require downlink training in 
the same amount considered in this work. In conclusions, while we have analyzed a specific downlink 
training scheme, we have that, for a cluster in isolation, the sum capacity scaling in the high-rate regime 
(high-SNR) is indeed the correct one. 

Fig. [5] shows the cell sum rate with consideration of training overhead and estimation error for 7^ = 7. 
Inspired by practical system values, we chose r = 1/64 and 1/32. In the finite-dimensional case, this 
corresponds to WT = 640 or 320 signal dimensions, respectively, with = 10 users per user groups 
(total KN/M = 240 users per cell). We notice that as 7 increases, the sum rates in most cases grow at 
first, achieve some maximum point and decrease, due to the tradeoff between the benefit from a large 
number of antennas and the training overhead cost. For given B and r, the maximum sum rate is achieved 
at jB = which is in line with the result of the non-coherent MIMO high-SNR regime when jB < A. 
For example, for S = 2 and r = 1/64, the sum rate is maximum at 7 = 16 where 27 = 2(T764)- 
5 = 1 and T = 1/64, the optimal 7 is beyond the number of user groups per cell. We can also see 
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(a) Perfect CSIT and no training overhead 




(b) Non-perfect CSIT and downlink training with 7^ = 7 



Fig. 5. Cell sum rate versus antenna ratio 7 for cooperation clusters of size B=\, 2, and 8. M 
groups. 



= 8 cells and K — 192 user 
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that, when the number of antennas is large, the no cooperation case (5 = 1) achieves the highest sum 
rate for both r = 1/64 and 1/32, which suggests that no cooperation gain can be expected, because the 
improvement of multi-antenna gain does not compensate for the dimensional decrease (pre-log factor) 
due to the training overhead. 

In order to see the best cluster size with downlink training and estimation, we consider a system with a 
large number of cells, M — 24. Fig. |6] illustrates the cell sum rate versus the cluster size S for 7 = 1, 2, 4, 
and 8 and r = 1/64 and 1/32 with 7^ = 7. In a linear cellular arrangement with M = 24, the clusters 



except for B = 1, 2, or 24, do not have the symmetric structure described in Section III- A So for those 



clusters, we notice that the solution of problem (36) under the cluster sum-power constraint produces 



an upper-bound of the optimal value under per-BS power constraint. Even though not explicitly shown 
in the figure, we can confirm that the cluster sum rate {B times the cell sum rate) is maximized when 
7S = ^ but as far as the cell sum rate is concerned, the optimal 7 for given B and r is smaller than 



the optimal one in terms of the cluster sum rate, i.e., For example, in Fig. 6(a) when 7 = 4 and 
r = 1/64, the maximum cluster sum rate is achieved at S = 8, but the cell sum rate given as the cluster 
sum rate divided by B is maximum at S = 3. When the channel is more time or frequency selective 



(r = 1/32), the optimum cluster size is S = 1 (no BS cooperation), as shown in Fig. 6(b) Furthermore, 



the cell sum rate is more sensitive to the cluster size, when the number of antennas is larger. 

VI. Conclusions 

We considered a multi-cell "network MIMO" system in a realistic cellular scenario, with inter-cell 
cooperation and fairness criteria. Specifically, we focused on linear zero-forcing beamforming combined 
with user selection. We derived the asymptotic expression in the large system limit and proposed an 
algorithm that computes the throughput point under an arbitrary fairness criterion, expressed by the 
maximization of a suitable concave and componentwise increasing network utility function over the region 
of achievable ergodic user rates. The proposed method handles the per-cluster sum-power constraint. 
We showed that under certain system symmetries, this coincides with the more stringent per-BS power 
constraint. In particular, the system symmetries make the analysis much simpler, as it allows for a closed- 
form solution of a fixed-point equation that characterized the zero-forcing beamforming performance. The 
fairness scheduling was applied in the form of stochastic network optimization. The proposed asymptotic 
analysis is computationally much more efficient than the Monte Carlo simulation. It also provides a good 
approximation of finite-dimensional systems, when the users are randomly selected according to the 
asymptotic user fraction in the large system limit. In particular, we proposed a probabilistic scheduling 



30 




(a) T = 1/64 




(b) r = 1/32 

Fig. 6. Cell sum rate versus cluster size B for antenna ratio 7=1, 2, 4, and 8 under non-perfect CSIT and explicit downlink 
training with 7^ = 7. M = 24 cells and K = 192 user groups. 
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scheme that randomly associates users with downhnk data streams according to probabiUties obtained 
from the asymptotic analysis, and provides a good approximation of the optimal throughput point while 
requiring much less CSIT feedback resource. 

Our analytic tool can be extended to handle explicit channel state information estimation, obtained 
from downlink training. This allows the investigation of the tradeoff between the number of jointly 
coordinated antennas and the cost of estimating higher dimensional channels. This tradeoff yields the 
optimal "cooperation cluster size" that maximizes the system throughput subject to fairness, when the 
cost of channel estimation is also taken into account. Due to this training overhead, the increase in 
the cooperation cluster size does not necessarily correspond to a system throughput increase. As a 
matter of fact, our analysis shows that in most cases no cooperation among base stations (conventional 
cellular systems) with a significant number of antennas per base station (large 7) yields the best system 
performance when the channel estimation cost is taken into account. This poses serious questions about 
whether "network MIMO" is a desirable solution, also taking into account that base station cooperation 
yields a non-trivial complexity increase in system implementation, requiring some form of centralized 
processing of all the B base stations in each cluster. It clearly appear that more effort should be devoted 
to using a larger number of antennas at each base station, since this yields larger system throughput and 
significantly less system complexity. 

Appendix A 
Proof of Theorem[T] 

For the sake of clarity, we recall some definitions and facts about random matrices with independent 
non-identically distributed elements (see [ [53| , (54)) of the type defined in Q, (|7]), that will be essential 



in the proof of Theorem [T] 

Definition 1: Consider an Nr x Nc random matrix H = [H^j], whose entries have variance 

Var[H,,,] = ^ (56) 

such that P = [P^ j] is an x Nc deterministic matrix with uniformly bounded entries. For given N^, 
we define the variance profile of H as the function v^'' : [0, 1) x [0, 1) -^R such that 

v'^'-{x,y)^Pij, i^<x<^, '^<y<± (57) 
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When we consider the Umit for Nr ^ oo with fixed ratio ^ ^ z/, we assume that y) converges 

uniformly to a bounded measurable function v{x^ y), referred to as the asymptotic variance profile of H. 
For random matrices distributed according to Definition [T] we have the following results. 

Theorem 4 ( [53, Theorem 2.52]): Let H be an x Nc random matrix whose entries are independent 
zero-mean complex circularly symmetric random variables satisfying the Lindeberg condition 

^5]E[|H,,,fl{|H,,|><5}]^0 (58) 



as ^ oc with ^ ^ z/, for all 6 > 0. Assume that the variances of the elements of H are given by 



Definition [T] and define the function 



<y< 



(59) 



As A^r- ^ oc with 



Nr. 



l^^^'Hui s) converges almost surely to the limit r{y, s), given by the solution 



of the fixed-point equation 

r{y,s) 



E 



v{X,y) 



1 + 



^(X,Y) 



IX 



(60) 



_i+sr(Y,5) 

where X and Y are i.i.d. random variables uniformly distributed on [0, 1]. 
Defining the effective dimension ratio as 

P(E[i;(X,Y)|Y] ^0) 



P(E[i;(X,Y)|X] ^0)' 
the following high-SNR limit can be proved. 

Corollary 1 (see / [5J| Theorem 3.1]): As s goes to infinity, we have 

^oo{y) if z/' < 1 



lim r{y,s) 

s^oo 







if z/' > 1 



(61) 



where, for u' < 1, ^oo{y) is the positive solution to 



1 + i/E 



*=o(Y) 



X 



(62) 



Now we enter specifically the proof of Theorem [T] Using the well-known formula for the inverse of 
a 2 X 2 block matrix, we can write the (j, j) diagonal element of the matrix (I + sH'^H)"^ as 



I+sH^H 



1,1 



1 + sh^ I I + s 5] heh^ I hj 



(63) 
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Furthermore, assuming that H has full rank, then 



(hHh)-' 



lim s 



lim 



I + 5H^H 



1 

lim h'^ ( 1 + 5 Vh^hV 



(64) 



Comparing the definition of A^^\/j,) in |llj) with (64) and using Theorem [4] and Corollary [ij we have 
that the desired limiting value of A^^^(/x) is given by ^^oo(l/). evaluated at the corresponding value of y 
such that < y < Jj- for j = YliZi fJ^iN + after replacing the general matrix H in Theorem |4] with 
given by our problem. 

In this case, the number of rows in the matrix is given by Nr = ^BN and the number of columns 
is given by Nc — ijlN . With the normalization by 1/ \fN of all the channel coefficients, the matrix 
defined in ^ is formed by independent blocks /^(Mfc) of dimension x fik^, such that each block 
has i.i.d. CAf{0, 13'^ j^/N) elements. As ^ 00, we have that Nc,Nr 00 with ratio u = By 
imposing the appropriate normalization, the asymptotic variance profile of Hy^^, is given by the piece-wise 
constant function 



with m = 1, 



v{x,y) ^-fBf^l^^j^ for {x,y) G 
, B and /c = 1, , 



m 



B 



1 m 

~'B 



/^l:k-l /^l:k 



V — V 



A. Also, we find explicitly 



(65) 



(66) 



and notice that the case z/' < 1 in (61) always holds since, by construction, rank(H^) = ^A^ almost 
surely. Hence, the limit for A^^^(/x) is obtained as the solution of the fixed point equation (62), for any 
y G ? ^Tt)- f^^^' piece-wise constant form of v{x,y) yields that A^^\fj,) converges to a 

limit that depends only on k (the user group) and not on i (the specific user in the group). 
With some abuse of notation, we let Ajt(//) = "^ooiy) for all y E 



^ ) , in order to denote this 

H fj, 



limit. Particularizing ( [621 ) to this case, we obtain 



B o2 



^ 5Z A 02 

m=l , h-'m.q 



k = 1,...,A 



(67) 
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It follows that the asymptotic limit of is block-diagonal, with scaled-identity diagonal blocks, where 
the k-th block is given by A/e(/x)I/i^Ar. 

The expression in ( [23] ) is eventually obtained by defining the auxiliary variables 

1 



Vn 



1 + 



m 



(68) 



q=l 



such that, from (67), we have A/e(/x) = 7 J2m=i k^rn- Using this into (68), we obtain the fixed-point 
equations 



1 



m 



1, 



(69) 



q=llY.^=imPlq 



which, after simple manipulation, yields ( [24| , as given in Theorem [T| 



As a final remark, notice that ( [69| ) has some significant advantages with respect to ([67]). In particular, 
the variables 77^ take values in [0, 1] (by construction), and typically we have B < A (less BSs in 
a cluster than user groups). Therefore, ( [69] ) can be initialized by letting rjm ^ I, and the fixed point 
equation iterative solution involves only B, rather than A, variables. Also, it is immediately evident by 



inspection that the solution of (69) for 77^ G [0, 1] always exists and it is unique. 



Appendix B 
Proof of Theorem[2] 

Before entering the proof of Theorem [2| we state and prove two auxiliary results given here as Lemma 
[2] and Lemma [3] 

Lemma 2: Let H be an Nr x Nc matrix of independent zero-mean elements whose variance is 0(1/A^r) 
and 4-th moment is 0(1/A^^), with converging variance profile as in Definition [T] Let hi^j denote its 
element in position (i, j), hj denote its j-th column, denote its z-th row after removing the (z, j)-th 
element, and Cj denote its j-th column after removing the (z, j)-th element. Also, let Hj denote the 
matrix of dimension A^^ x (A'c — 1) obtained from H by removing hj, and j denote the matrix of 
dimension {Nj. — 1) x {Nc — 1) obtained from Hj by removing its i-\h row, which coincides with r^. 
Consider the two quadratic forms: 

-1 



I + sH^H^^ 



(70) 



and 



(71) 
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where 5 > is a real parameter and where I indicates an identity matrix of appropriate dimension. Assume 
that H, Hj and Hij have all full column rank, i.e., rank(H) = Nc, rank(Hj) =rank(H^ j) = Nc — I, 
with probability 1. Then, for any 5 > 0, A^c and Nr > Nc, the difference Xij = Vj ~ViJ is a non-negative 
random variable that converges almost surely to zero as Nr^Nc 00, when the ratio Nc/Nr = is kept 
constant. 

Proof: Since the matrix H has independent elements with same order (with respect to Nr) of 
variance and 4-th moment, it is sufficient to prove the statement for z = j = 1. First, notice that 771 and 



iji^i (defined through (70) and (71 )) differ by the fact that the first row of the matrix H is removed. The 
lemma says, essentially, that under these conditions removing a row from the matrix H yields a very 



small variation of the value of the quadratic form ( [701 ), indeed this variation tends to zero a.s. when 
the matrix dimensions grow large with fixed ratio. We can write 



H 



Hi 



Hii 



hi^i ri 
ci Hi,i 

In order to find a relationship between 771 and ryi^i, we notice that 

I + 5HiH^ ^ 



ci 



(72) 



(73) 



Letting the 4 blocks of the 2 x 2 block matrix above be denoted by Mn, M12, M21, M22 and defining 
the Schur complements of the diagonal blocks by 



and 



All = Mil - Mi2M22^M2i = 1 + sriT^ - shiH^^^M^^Hi^ir^ 

^^Hi^irHriHH, 



A22 = M22 - M21M1/M12 = M22 
we have the 2 x 2 matrix inversion formula 

I + 5HiH^ 



A-^ 
^11 



5rir^ 



-A^/Mi2M^oi 



12iVl22 



-A2-2^M2iMr/ 



'■22 



(74) 



(75) 



(76) 



After some tedious but straightforward algebra based on repeated use of the Sherman-Morrison matrix 
inversion lemma (omitted for brevity), and noticing that in this case An is a scalar quantity, we arrive 
at 

-1 



I + sHiH!^ 



- 2sRe{/i^ iriH5;^;^M22^ci} + |riH5^;^M22^ci | 



+ c^M^ici (77) 
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From definition (71) we have that 171,1 — C]^M22^ci. Therefore, 



Xi,i 



|/ii,ip - 2sRe{hl -^vi'H.^^^M.22Ci} + |riH5^iM^2^ci | 



'-ii 



|/ii,i -srlH5^lM22^Cl| 



(78) 
(79) 



^11 



For finite A^^, the RHS of the above equahty is a non-negative random variable. First, we check that 
Xi^i is well-defined for all s > 0. By writing explicitly the denominator, we have 

All = 1 + srir^ - s2riH5;'^iM22^Hi,ir'^ 

= 1 + snr^ - shiH^^ [l + sHi,iH5^_i] ~^ Hi^irJ^ 

"1 



1 + srir^ - s^nH^^^iHi^ir!^ + shiH^^^Hi^i 

1 



:I + hMi 



H^Hi,irH 



1 + ri 



si - s^H^^iHi^i + s^H^^Hi^i 



1 + riU 



si - s^r + s^r 





-1 




r 


5 





1 + nU diag ( s - s^Vi + s^r^ij—^ 



i + hh,hh, 



Hi,iHi,i 



(80) 
(81) 

(82) 



1 -I- riU diag 



1/s + Ti 



(83) 



where (80) follows by applying the matrix inversion lemma, (81) is obtained by collecting ri on the 



left and on the right, (82) follows by letting H5;'^Hi,i = UrU^, where U is {Nc - 1) x (A^^ - 1) 
unitary and F is the diagonal matrix of the eigenvalues of H^^^Hi 1, denoted by Ti, that are all strictly 



positive by assumption that rank(Hi,i) — Nc — 1, and finally (83 ) follows by simplifying the terms in the 
inner diagonal matrix. Notice that as s — ^ 00, Ai^i — > 1 + ri (H^^^Hi^i) ^ where the convergence is 
monotone from below, i.e., 



l<Ai,i<l + ri HV,iHi,i 



for all 5 > 0. 



Next, we examine the term 5riH5^^M22^ci in the numerator of (79). By proceeding in a very similar 
way as before, we arrive at (details are omitted for the sake of brevity): 

sriH!^,iM2-2^ci = nU diag ( jy^^) U'^H^'^iCi 



(84) 
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For 5 ^ oc, we have that this term tends to ri (H^^^^Hi^i) ^H^^^^ci. Since ri and ci are vectors of 
independent components, mutually independent, and independent of Hi i, and since they have elements 
with mean 0, variance 0{1/Nr) and 4-th order moment 0(1 /N^), also the inner product in the right-hand 
side of ([84]) has variance and 4-th order moment with the same behavior, for all 5 > 0. For Nr ^ oo, 
Nc/Nr = ly, we have An tends to a finite deterministic constant d{s) > 1, that can be calculated by 
standard results in random matrix theory. In contrast, the numerator i — 5riH5^]^M^2^ci |^ of xi,i 
converges to a.s., since 

- sriHi iM^2"^ci ^ < 4max||/ii,i|^, sriH^-^M^^^i ^| 

and each of these terms tends to a.s.. This concludes the proof of Lemma [2] ■ 
Remark 1: We wish to remark at this point that the result of Lemma [2] is expected. In fact, notice 
that the two quadratic forms in ( [70| and ( [tT] ) correspond to the "multiuser efficiency" of user j, in two 



systems that differ just by eliminating one row (the i-th row) from the spreading matrix H. By eliminating 
a single row, the asymptotic properties of the matrix do not change. In fact, the variance profile and the 
matrix aspect ratio remain the same. Hence, it is completely expected that the difference between the 
two efficiencies, asymptotically, vanishes. A byproduct of the above analysis is that, since all quantities 
involved in the ratio ( [79| ) are bounded functions of 5, by bounded convergence we can exchange the 
limits of 5 ^ oc and Nr ^ oo. This will be used in the proof of Theorem IbI since we notice that the 



limit of 771 for 5 ^ oc yields, for appropriate choice of the column j, the elements A^^\fi) (see (64)) 
appearing as the gains of the zero-forcing precoder. 
Lemma 3: Let x be a n-dimensional vector with i.i.d. entries with variance -. Let A and C be n x n 

n 

Hermitian symmetric matrices independent of x, and let D be a n x n diagonal matrix independent of 
X. Then: 

xHdH(DxxHdH + A)-iC(Dxx'^D'^ + A)-1Dx ^ '^(D^A-^CA-iD) 

(1 + (/.(DH A-iD))^ 

where </>(•) = lim„_j.oo ^tr(-) and the convergence is almost surely. 
Proof: Let 

Q = x"d"(Dxx"d" + A)-1C(Dxx"d" + A)-^Dx 



From the inversion lemma we have that 



fOxx'^D'^ +a) ^ = A"^ M^M> 1^ A~^Dxx^D'^A~^ 

V J l + xHD^A-iDx 



(85) 
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Hence, 



Q = x'^D'^A-^C(Dxx"d" + A)-^Dx 

M M . 1 x^^D'^A-^Dxx^^D^^A-^C(Dxx^D^ + A)-1Dx 

l + xHD^A-iDx ^ ^ 



1 + xHDHA-iDx 



(86) 



where 

a = x^^D^^ A-iC(Dxx^^D^^ + A)-^Dx 
Applying again the inversion lemma we have: 



a = x'^D'^A-^CA-^Dx 



1 + xHDHA-iDx 
b 



x^D^A-^CA-^Dxx'^D'^A-^Dx 



where 



1 + xHdHA-1Dx 

b = x^^D^^A-^CA-^Dx 



(87) 



From (86) and (87) we obtain 



Q = 



x^D^A-iCA-iDx 



(88) 



(1 + xHDHA-iDx)^ 

Finally, we arrive at the desired result by using the well-known fact in random matrix theory according to 
which lim^^oo x'~'Mx = 0(M) provided that x, M are independent, that M has a well-defined limiting 
eigenvalue distribution and that x has i.i.d. elements with mean zero and variance 1/n. ■ 

Using Lemmas [2] and [3} we can proceed with the proof Theorem [2j From the expression of 0rn,/c(M). 
it follows that 



1 

N 



i=l e=l+{m-l)iN 



= ^tr (*^H^(HH H^)-iAj/20fcAj/2(HH H^)-iHj$„) 



(89) 



where is a diagonal matrix with all zeros, but for consecutive ones, corresponding to positions 
from (m — 1)7 AT + 1 to m^N on the main diagonal, and where 0^ denotes the /LfAT-dimensional diagonal 
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matrix with all zeros, but for iikN consecutive ones, corresponding to positions from fJ^i-.k-iN + 1 to 
/j.i:kN on the main diagonal (recall that we define the partial sum jui-k = Yl^=i l^j)- 

The submatrix of $^H^ corresponding to the non-zero rows, i.e., including rows from (m — 1)7A^ + 1 
to m^N, can be written as 



[/3^,lH^,l(/il), • • • , /3m,AH^,A(MA)] W^B^ 



(90) 



where is a x fiN rectangular matrix with i.i.d. entries, with mean and variance and 



Bjn = diag 



/3m, 1 5 • • • 5 /3m, 1 5 • • • 5 Pm^k) • • • 5 Pm^k^ • • • 1 Pm^A) • • • 5 Pm^A 



\ 



(91) 



^JilN iJikN ijlaN 

After simple algebraic manipulation, letting the ^-th row of be denoted by mv^ ^ we can write 

B 



H — ^ H 

BmW 

+ J2 B,W,^W,B,. 



(92) 



g/m 



In order to be able to apply Lemma [3} we need that the variance of the elements of the i.i.d. vector w^^^ 
(playing the role of x in the Lemma), is equal to the inverse of the vector length. Therefore, dividing by 
/X, we define 



1 ^ 

A = -J^B.W.^W.B, 



and 



= A 



q=l 



1. 



(93) 



(94) 



At this point, we let A^ denote the diagonal matrix with diagonal element in position given by 

where j = ixi-k-iN + i, and where hp indicates the p-th column of Hyi^. Also, we denote by Ayi^ the 
diagonal matrix with diagonal element in position (j, j) given by 
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where hp denote the columns of Hfj, after removing the row corresponding to w^^^, in the horizontal 
m-th "slice" of defined in (90). By the result of Lemma |2] and using the argument in Remark [T] 
about the exchange of the limits for 5 ^ oc and for ^ oc, we can write 



(95) 



P+3 



where Xj is a non-negative random variable converging to zero almost surely, as ^ oc. 

We would like to apply Lemma [3] to a suitably manipulated version of the expression (89) of Qra.k^lA- 
As we shall see below, the vector w^^^ and a matrix that depends on the elements of play the role 
of X and C in Lemma |3j However, since ^ra,i defines a row of Hy^^, and Ay^^, is also a function of H^, 



the independent required by Lemma |3J does not hold. Therefore, we shall use A^ in lieu of h.^ in (89), 
allowing for a small error due to the terms Xj- By doing so, we will be able to apply Lemma |3] since 
^ra,i and h.^ are statistically independent. By letting N ^ 00 and using Lemma |2| we shall show the 
error term vanishes almost surely. 



1/2 1/2 

Keeping in mind the above proof scheme, we write the matrix A^ ®k^fi as: 



A^/'0,A 



1/2 



(96) 



where both C 



are diagonal matrices, with 

A^\fi) for j = ;xi:fc_iA^ + z, 



and with 







elsewhere 



Xj for j = A/i:/c-iA^ + i, 
elsewhere 



,A^ 



(97) 



(98) 



(m i) 

Notice that and ' ^ have both dimension jiiN x jj^N and are independent of MVm/- 
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Using ([91]), ([96]), ([93]) and ([94]) in ([89]) we arrive at 



(99) 



j(p(BmA-^CkA-^Bm) 
(l + </.(B^A-iB^))2 

1 1 /I 

• Qb^W^^^W^^^B^ + A^^^^ Bm^rn/-^ (101) 



where the last Une follows by applying Lemma pi where we replaced C^^'^^ with Ck defined as in (97) 



and Ajn^£ with A. This can be done since ' ^ and Ck have the same limit for ^ oc (see proof 



of Lemma [2]), and by noticing that all the terms for different i in the sum in ( [99] ) converge to the same 



limit (by statistical symmetry), that can be obtained by using A in lieu of A^^^ after applying Lemma 

m 

Let's examine the limit of the error term in ( |101| l. We can write 



iN ^ /I 



• ^-B^w^^^w^^^^B^ + A^^^^ B^w^^^— (102) 



< xS^^)j4=l|w™/f (103) 



where is a constant bounded away from zero and proportional to the square of the non-zero minimum 
eigenvalue of A (notice that A is invertible by construction), and where Xmax is the maximum of the 



elements Xj ' • For ^ oc, we have that ||w^^^|p converges almost surely to a finite deterministic 

max I3L u 

limit (strong law of large numbers). Hence, limA^^cx) ^ J2J=i -^\\^m/\\ ^ converges almost 

surely to a finite constant, and using Lemma |2] we have that the limit in ( |101| ) is zero almost surely. 



We have concluded that the sought limit for ^ oc of Om,k{t^) is given by the expression in (100). 



Therefore, our goal is now to evaluate the two limit normalized traces in ( |100| ). We start by the term in 
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the denominator , which is considerably simpler. We have 



(B^A ^Bm) 



lim — tr(B^A ^B^) 



lim tr 

iV->oo jlN 



/Li 



B 



= lim 



A IJikN n2 



m.k 



k=i 1=1 yt^) 



(104) 



where we used the fact that, by definition, 



k 



for the diagonal elements of \ li^¥Lfj,j in position fii-^^iN+i for z = 1, . . . fikN, and the convergence 
result of Theorem [T| Comparing ( |104| ) with the expression of r]m in ([68]) in Appendix |A| we have that 

where {r]m{tJ^) : m = 1, . . . , 5} are defined in Theorem [T] as the solutions of the fixed-point equation 
( |24| ). Then, the denominator of ( |100| ) can be written as 

v2 



(l + </>(B^A-iB^)) = r?-2(/.) 



(106) 



Next, we consider the numerator of ( |100| ). For this purpose, let /? be a dummy non-negative real variable 
and consider the identity: 

-1 
dp 



^tr ((pB^ + A) ' Cfc) = tr (B^(pB^ + A)-iCfc(pB^ + A)-1B^) 



(107) 



By almost-sure continuity of the trace in the left-hand side of (107) with respect to p > 0, it follows that 



the desired expression for the numerator of (100) can be calculated as 



d 



(B^A-^CfcA-^B^) = lim 

^ ' pio dp 



pBl, + A 



(108) 



In order to compute the asymptotic normalized trace in ( |108[ ), we use (53} Lemma 2.51], reported here 
for completeness. 
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Lemma 4: Let H be A^^ x Nc of the type given in Definition [T] satisfying the same assumptions of 
Theorem |4] For any a, 6 G [0, 1] with a < b, 

L^^^J r . .11 rb 



(109) 



where Nc/Nr v and where rHHH(^, <s) and Thhh(2/5 are functions defined impHcitly by the fixed- 
point equation 

1 



rHHH(x,5) 

T^hhh(i/, 5) 



1 + ^5E b(x,Y)THHH(Y,5)] 
1 



(110) 



I + [i;(X,y)rHHH(X,5)] 

for (x,?/) G [0, 1] X [0, 1], where X and Y are i.i.d. uniform- [0, 1] RVs and where the variance profile 

function v[x^ y) was introduced in Definition [T} ■ 
In order to use Lemma |4] we write 

tr((pB^ + A)-'c,) = tr((pI + B-iAB-i)-'B-iCfcB-i) 

= -tr( fl+.B-iAB-i^ B-iC,B-M 



(111) 

P \\ P J J 

Noticing that, by definition, A = ^H^^H^, we can identify the matrix ;^B~^Hji with the matrix H 
of Lemma |4j In this case, A^^ = pN and Nc — -yBN. Using {B^} and {W^} defined before, we can 
write the block-matrix form 



H 



so that 



BiWiH,B2W^,...,BbWS 



B-lBlWl^B-lB2W2^ . . . ,B-1BbWH 



It follows that the variance profile function of ^B^^H^ is given by 



Pi 



, for (x, y) e 



pl:k-l Pl:k 



P P 



-1 e_ 



Using this in Lemma |4] and letting 1/p — s, we find 

J2 [(l + sB-iAB-i) 

i=l2i:k-lN+l 

where r^(x, 5) and T^(y, 5) are defined by 



1 

~iIn 



(112) 



(113) 



l + 3|5EK(x,Y)T^(Y,s)] 



1 + sE K(X,y)r„(X,s)] 



(114) 
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Noticing that Vm{x^y) is piecewise constant (see (112), we have that also the functions r^(x,5) and 
T^(y, s) are piecewise constant. With some abuse of notation, we denote the values of these functions as 



{Tm^q^s), q = 1, . . . , A} and {T^^^(5), ^ = 1, • • • , B}, respectively, we find that (114) can be re- written 
directly in terms of these values as 

1 



1 + ^ Ef=i 

= _ , ^„ »2 — for £^1,...,B 



-, for q = l,...,A 



-1 I _S f^(l(^i,q p 



m,q 



is) 



(115) 



Finally, using (113) and (111) and noticing that the non-zero diagonal elements of B^^C^B^^ converge 
to the constant Afc(^t)/3~^^, we arrive at: 



Pk 



r^,fc(i/p)A,(^)/3-2, 



(116) 



It turns out that it is convenient to define the new variables 

1 



rm,g ( 1 /p) , and G^/ (p) = Tm,e (l/p) 



m,q 



Therefore, we can rewrite ( |115| ) and ( |116| ) as 

Sm,q{p) 



P^l,q+iEtl^lGmAPy 



for q = I, 



.A 



Gm,/Xp) 

pB^ + A)-'Cfc 



^ + lY.Ul''i^lqSm,q{p) 
Pk 



, for 



P 



^kifJ')Sm,kip) 



(117) 
(118) 
(119) 



Taking the derivative in ( |119| l, we have that the numerator of <\100\ can be obtained as: 



lim — — <; 
pio dp 



— Afc(/x) lim -^Sm,k{p) 
fi p^o dp 



(120) 



where we define Sm,k{^) = -j^Sm,k{p)\p=o ^^^^ 1^^^^ ^s^' Gm/i^) = j^GmAp)\p=0' 

Next, we wish to find a fixed-point equation that yields directly Sm,k{0). By continuity, we can replace 
directly p = into the fixed point equations after taking the derivatives. By doing so, from ( |117| ) and 
( |118| ), we obtain: 

/3l,q+lEtlPlGmA0) 



Sm,q{0) 



Ell PlG^Ao)) 



-, for g = 1,...,^ 



for 



h...,B 



(121) 



(122) 
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Also, the equations for Sjn,q{0) and Gm,e{0), obtained by replacing p = in (117), (118), read: 

1 



1 



Replacing ([123]) into ([124]), we obtain, for all £ = 1, . . . , 

1 



, for g = 1,...,^ 



, for 



1 + 



(123) 
(124) 

(125) 



7E?=i/??,„,G„.,,(0) 



By multiplying both sides by 7/3|„ and summing over I, we find 



/3f 



B 

Um,q ~ T / , . ,, ,fl2 5 



(126) 



where we define Um,q — 7j2e=i Pe „Gm,ei^)- Comparing the fixed point equation (126) with (67), we 



discover that Ujn,q = Ag(/i), independent of m. Using this result in (123), we obtain 



S^,qiO) 



(127) 



Using the definition of Um,q, ( 121 ) can be written as. 



-5m,g(0) 



(128) 



where, with some abuse of notation, we define Um,q = 7j2f=i qGm/{0). 

Multiplying both sides of (122) by 7/3^^, using (128) and (127) and summing over i, we obtain 



B 



^ Y^q' = l ^^q'f^\q'Sm,q'{^) 
^ + lE}=lf^q'Plq>Sm,q'{0) 



1 , v-A /V^V 



q' = l 



B 



1=1 



(129) 



(130) 



where we have used again the identity ( |105| ) in the denominator of ( |129| ). Somehow surprisingly, we 
notice that ( 130) is a system of A linear equations in the A unknown {Um,q : g = 1, . . . , A}. Therefore, 
this can be solved explicitly (although not in closed form in general). In particular, we define the Ax A 
matrix 



M 



1=1 



diag 



Ml 



MA 



A?(/x)'""A2^(/x) 



(131) 
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Fig. 7. Finite dimensional samples of Om,k{tJ) with = [4 8 16 32 64 128 256] (dots) and asymptotic values in the large 
system limit (lines) for m = 1, /c = 1, . . . , 8 under the settings of Example [T] 



where = . . . , and the vector of unknowns U^, then, we the Unear system corresponding 

to ( |130| ) is given by 

[I - 7M] = (132) 



Solving the system ( |132| ) and using ( |128| ) in ( |120| ), we obtain the sought numerator of ( |100| ) in the form 



(133) 



Finally, putting together ( |100| ), ( |106| ) and ( |133| ), we obtain our final result: 

7 0(B^A-iCfcA-iB^) 



M (l + 0(B^A-iB^))2 

7 M/c 

M Afc(/x) 



(134) 



where in the last line we used Theorem [T] Comparing ([27]) and ( |134| ), we see that the two expression 
coincide by letting = f^m/^- Therefore, Theorem [2] is proved. 

Fig. [7] shows finite dimensional samples of Om^kil-^) for randomly generated channels and their asymp- 
totic values obtained from ( |134| ) under the settings of Example [T] As increases, the finite dimensional 
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samples (dots) converge the asymptotic values (lines) and this example shows the validness of the 
asymptotic analysis. 



Appendix C 
Proof of Lemma[T] 

We prove this lemma using the results of Theorem [2j Under the symmetric system conditions described 



in Section 



III-Al), each = • • • ^/^^a)^ ^ cyclic shift of another by multiples of = A/B, 



i.e., Prn^^j^k = l^m.k^AjA'^ ^s showu in (30). When jik = Mi for ^ equivalence class i, the 



matrix M in (29) becomes block-circulant with submatrices of size A' x A' , since 77^ (/x) is independent 
of I and A/e = A^ for all k in equivalence class i under the symmetry conditions. Then, the matrix 
[I — 7M], its inverse, and [I — 7M]~^M are also block-circulant and multiplying a block-circulant matrix 
[I — 7M]~^M by the cyclic shift of by jA' positions, for j G Z, produces the cyclic shift of in 
([28]) by jA' positions, i.e., we have (^^^j^^ = Cm,k®AjA'- From expression (27), noticing that fik = /x^ 



and J2f=i^ik ^ constants independent of k, for all k in equivalence class i, and rim{ii) is 



independent of m, we obtain that Or^^^j^k = ^m,k®AjA'^ as we wanted to prove. 
If also qk = q[ for all k in equivalence class z, we have 



A 

E 

k = l 



B A' 



QkOm^kit^) ^l^l^(l'iOmMA^A'{tj) 

t=l 1=1 



A' B A' 
i=l i=l i=l 



(135) 



where we used the fact that, by construction of the matrix and the definition of the coefficients 
^k,m{l^) (see (26)), the equality J2f=i ^k/{l^) = M/c holds in general (even in the non-symmetric case). 



Appendix D 
Proof of Theorem[3] 



Let denote the beamforming matrix for given user fractions /x, defined as in Section II-B 
replacing Hjj, with H^, defined as in ^ with the change ^m,k- 

Let's focus on a generic user j in group k. From ([3]) and ([9]) the received signal is given by 

H ^ 



after 



= h 



k 



(136) 



where we used the fact that v^-'-' is orthogonal to all measured channel vectors h^*\ for all other scheduled 



users, and we used the decomposition (47). The useful signal coefficient ( h^. j v^'^'^ is, by construction, 

■1/2 



equal to the diagonal element corresponding to user j in group k of the matrix , calculated from 
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as in (11). The additional interference term ^e^^^^ V^u is the intra-cluster multiuser interference 



due to the fact that CSIT is not perfect. 

A standard technique to lower bound the mutual information /(i^^^^; y^^^ |H) is as follows: 



Mn?)-M4'^|y?,H) 



logneqk - hiu^j^^ -ay)^>\y)^ >,ii 



> log Treqk - E 



log7reVar(nlj'^ 



(137) 



(i) 

where we assumed that Uj^' is Gaussian with variance 1 (denoting, as before, the transmit power to 
users in group k). The bound holds for any coefficient a. In particular, we wish to use the coefficient 
that minimizes the conditional variance Yar {u^f^^ — ay|f^|H), given by the linear MMSE estimation of 
Uj^' from yj^' for given H. After standard algebra, omitted here for the sake of brevity, we obtain the 



variance (conditional MMSE estimation error) 



E 



H 



+ 1 



ft^^A (i) 
[^k ) 



ii) 



H Jj) 



H 



(138) 



+ 1 



Replacing this into ( |137| ), we obtain the desired lower bound in the form 

/ 



I(4^);yi^)|H)>E 



log 



1 + 



V 









Qk 


1 + E 




H 





(139) 



Let's examine the terms in ( |139| ) separately. As already said before, the coefficient in the numerator of 
the SINR term inside the logarithm, in the large system limit, is given by (h^^. j v^^^ ^k{t^)^ 
where the latter is obtained via Theorem [l] replacing the coefficients ^rn,k with the new coefficients /e 



defined in ([50]), thus obtaining p4^ and 

The intra-cluster interference term in the denominator can be evaluated as follows. First, notice that 
because of the properties of the MMSE estimator, the channel error vector is independent of the estimator 
H. Therefore, the conditioning with respect to H makes Yjj, and the diagonal matrix of transmitted powers 
Q act as constant matrices with respect to the conditional expectation, since they are both functions of 
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the CSIT H. We have 



E 



H 



tr(QV^Cov(4^))V^ 

tr (qV^EfcV^ 
tr (EfcV^QV^ 



(140) 



m=l 



where the last Hne follows from the definition of Tij^ in (46), which is block-diagonal with B diagonal 



blocks of dimension x jN, and the m-th diagonal block is given by /3'L jJ where is defined 



in (52), and by noticing that V^QV^ is the covariance matrix of the signal transmitted from all the 
base stations forming the cluster. Under a per-BS power constraint, the partial trace of this matrix on 
any diagonal segment corresponding to base station m (diagonal segments of length 7A^) is equal to P^. 
Therefore, the simple form of ( |140| ) follows. This concludes the proof. 
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